Load packages:
Resources used to create this lecture:
Video from Will Doyle, Professor at Vanderbilt University
What is version control?
How version control works:
cookies.txtcookies.txt (e.g., add alternative baking time for people who like “soft and chewy” cookies)cookies.txt; rather, you save the changes made relative to the previous version of cookies.txtWhy use version control when you can just save new version of document?
Credit: Jorge Chan (and also, lifted this example from Benjamin Skinner’s intro to Git/GitHub lecture)
What is Git? (from git website)
“Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency”
What is a Git repository?
What is GitHub?
“Whoah, I’ve just read this quick tutorial about git and oh my god it is cool. I feel now super comfortable using it, and I’m not afraid at all to break something.”— said no one ever (de Wulf)
Understanding and learning how to use Git and GitHub can be intimidating. A lot of tutorials give you recipes for how to accomplish specific tasks (either point-and-click or issuing commands on command line), but don’t provide a conceptual understanding of how things work.
Here is how we will learn Git and GitHub over the course of the quarter:
What is a shell?
What is graphical user interface (GUI)?
In this course, we will perform Git operations solely using the command line. Why?
Background information on the Unix shell “Bash”
We will use the Unix shell called “Bash” to perform Git operations:
Why learn the command line and “command-line bullshittery,” from Philip J. Guo
“What is wonderful about doing applied computer science research in the modern era is that there are thousands of pieces of free software and other computer-based tools that researchers can leverage to create their research software. With the right set of tools, one can be 10x or even 100x more productive than peers who don’t know how to set up those tools.”
“But this power comes at a great cost: It takes a tremendous amount of command-line bullshittery to install, set up, and configure all of this wonderful free software. What I mean by command-line bullshittery is dealing with all of the arcane, obscure, strange bullshit of the command-line paradigm that most of these free tools are built upon….So perhaps what is more important to a researcher than programming ability is adeptness at dealing with command-line bullshittery, since that enables one to become 10x or even 100x more productive than peers by finding, installing, configuring, customizing, and remixing the appropriate pieces of free software.”
Helping my students overcome command-line bullshittery by Philip J. Guo
If you have a Windows computer, you will need to follow these steps to install Git for Windows, which will allow you to run Bash and Git commands. If you have a Mac, you won’t need to download anything because it already comes with a Terminal app. However, if you have a newer version of Mac, you may need to run xcode-select --install in your Terminal before you’re able to use Git commands (see here for more info).
In RStudio, there is a Terminal tab (next to the Console tab) where you can run Bash commands and perform Git operations:
Credit: RStudio Terminal blog post by Gary Ritchie
If you are working from an R markdown file, you can also create bash code chunks (similar to R code chunks) for running shell commands. All you need to do is indicate {bash} for the code chunk:
What is the difference between the RStudio Console and Terminal?
In this section, we will go over some of the commonly used command line commands. You can run these commands either in your RStudio Terminal or in a bash code chunk of an R markdown file.
Generally, you can pull up the help file for a command by running:
command_name --help (Windows)man command_name (Mac)
We’ll use the ls command as an example:
ls: List directory contents
ls [<option(s)>] [<directory_name(s)>]
[] indicates they are optional and you do not have to specify these- or -- (see help file)
- is the way to specify the short name version and -- is the way to specify the long name version of an option [x]-a: Include directory entries whose names begin with a dot (.)
-l: List files in long format (i.e., include additional information like file size, date of creation, etc.)directory_name(s): Which directories to list the content of (default: current directory)list.files()
Example: Using ls to list content in current directory (default)
## git_and_github.Rmd
## git_and_github.html
## render_toc.R
Example: Using ls to list content in parent directory
## apis_and_json
## ggplot
## git_and_github
## organizing_and_io
## programming
## strings_and_regex
Example: Using ls -a to list content in parent directory including entries whose names begin with a dot
## .
## ..
## apis_and_json
## ggplot
## git_and_github
## organizing_and_io
## programming
## strings_and_regex
echo: Write to standard output (i.e., print to terminal)
echo <text_to_print>
help echo to access the help file on Windowstext_to_print: Text to print to terminal>>> (i.e., not overwrite existing content of file)cat: Concatenate and print files
cat <file_name>file_name: File to print to terminalExample: Using
echo and > to redirect text to file and cat to print content of file
## Hello, World!
# We would overwrite contents of file when using `>`
echo "library(tidyverse)" > my_script.R
# Print contents of file
cat my_script.R## library(tidyverse)
Example: Using
echo and >> to append text to file and cat to print content of file
# Append line to R script by using `>>` (`>` would overwrite contents of file)
echo "mpg %>% head(5)" >> my_script.R
# Print contents of file
cat my_script.R## library(tidyverse)
## mpg %>% head(5)
head: Print first part of file
head [<option(s)>] [<file_name>]-n <int>: Print the first <int> lines (default: 10)file_name: File to printtail: Print last part of file
tail [<option(s)>] [<file_name>]-n <int>: Print the last <int> lines (default: 10)file_name: File to printExample: Using
head to print first part of file
## library(tidyverse)
## mpg %>% head(5)
## library(tidyverse)
Example: Using
tail to print last part of file
## library(tidyverse)
## mpg %>% head(5)
## mpg %>% head(5)
cp: Copies files or directories
cp [<option(s)>] [<source_file/directory>] [<destination_file/directory>]-r: Copies directories and their contents recursively (this flag is required to copy a directory)source_file/directory to destination_file/directoryExample: Using
cp to copy a file
## library(tidyverse)
## mpg %>% head(5)
# Make a copy of my_script.R called my_script_copy.R inside my_folder/
cp my_script.R my_folder/my_script_copy.R
# Print contents of my_script_copy.R
cat my_folder/my_script_copy.R## library(tidyverse)
## mpg %>% head(5)
Example: Using
cp -r to copy a directory
## my_script_copy.R
## test_script.R
# Make a copy of my_folder/ (with its contents) called my_folder_copy/
cp -r my_folder my_folder_copy
# View contents of my_folder_copy/
ls my_folder## my_script_copy.R
## test_script.R
mv: Rename or move files
mv [<old_file/directory>] [<new_file/directory>]mv [<file/directory(s)>] [<destination_directory>]Example: Using
mv to rename a file or directory
Example: Using
mv to move files and directories into a directory
## my_script_copy.R
## test_script.R
# Move file and directory into the destination directory (last arg)
mv create_dataset.R my_folder_2 my_folder
# View contents of my_folder/
ls my_folder## create_dataset.R
## my_folder_2
## my_script_copy.R
## test_script.R
This section introduces some core concepts and explains the basic Git “workflow” (i.e., how Git works)
Version control systems that save differences:
twinkle.txttwinkle.txt has the following contents:
twinkle, twinkle, little startwinkle.txt and save those changes, resulting in “Version 2,” which has the following contents:
twinkle, twinkle, little star, how I wonder what you are!twinkle.txt, centralized version control systems don’t store the entire file. Rather, they store the changes relative to the previous version. In our example, “Version 2” stores:
, how I wonder what you are!Credit: Getting Started - What is Git
Git stores data as snapshots rather than differences:
Credit: Getting Started - What is Git
What is a commit?
Credit: Lucas Maurer, medium.com
cookies.txt):
cookies.txt in a text editor. These are changes made in your local working directory.cookies.txt and you want to commit those changes to your repositorycookies.txt:
Credit: Modified from Atlassian, Git push
Credit: Simon Maple, JRebel, https://www.jrebel.com/blog/git-cheat-sheet
Git commands:
add: Add file from working directory to staging areacommit: Commit file from staging area to local repositorypush: Send files from local repository (your machine) to remote repository
push as “uploading”fetch: Get files from remote repository and put them in local repositorypull: Get files from remote repository and put them in the working directory
pull as “downloading”pull is effectively fetch followed by merge (discussed later)reset: After you add files from working directory to staging area, reset unstages those filesGit command cheatsheets:
When performing git operations on command line, all commands begin with git, for example:
git initgit clone url_of_remote_repositorygit statusFor an overview of git command syntax and a list of common git commands, type this in command line:
To see the help file for a particular git command (e.g., add, commit, clone), type git command_name --help. For example:
Basic/essential git commands:
git init
.git/] within the existing directory that houses the internal data structure required for version control” (Git Handbook)git clone url_of_remote_repository
git add file_name(s)
git commit -m "commit message"
-m is an option to the git commit command, which specifies that you will add a brief description about changes you are committing. You can reference an issue in the commit message by using a hashtag followed by the issue number: #<issue_number>. These commits will appear on the issue page.git status
git push
git pull
What are local and remote repositories?
There are 2 basic ways to get your local repository set up with a remote:
git remote: Show list of connected remote repositories
git remote --helpgit remote [<option(s)>]-v: Show more detailed info about the remotes, including its URL
Understanding how local and remote repositories are connected:
git remote to check which remote repository is connected (i.e., which remote(s) you can push to and pull from)
Obtain the URL of the remote repository on GitHub:
Initialize this repository with options before creating
Code buttonClone the repository to your local machine:
git clone command to clone the repository to your local machinegit add changes to file(s) from the local working directory to the staging areagit commit -m "commit message" all staged changes to the local repositorygit push to push changes from your local repository to the remote repositoryCredit: W3 docs, Git clone
git clone: Clone a repository into a new directory
git clone --helpgit clone <repo_url>
repo_url can be the HTTPS or SSH URLExample: Using
git clone to clone a repository
https://github.com/btskinner/downloadipeds.gitgit@github.com:btskinner/downloadipeds.git# Change directory to where you want to clone the repository
cd ~
# This will be the directory where the `downloadipeds` repository will be cloned
# Note that you do not need to create a `downloadipeds` sub-directory yourself
pwd## /Users/cyouh95
# Clone the remote repository
git clone https://github.com/btskinner/downloadipeds.git # HTTPS URL
# git clone git@github.com:btskinner/downloadipeds.git # SSH URL## Cloning into 'downloadipeds'...
# Change directory to the newly cloned `downloadipeds`
cd downloadipeds
pwd
# List out contents of repository
ls -la## /Users/cyouh95/downloadipeds
## total 80
## drwxr-xr-x 8 cyouh95 staff 272 Feb 19 11:11 .
## drwxr-xr-x+ 111 cyouh95 staff 3774 Feb 19 11:11 ..
## drwxr-xr-x 12 cyouh95 staff 408 Feb 19 11:11 .git
## -rw-r--r-- 1 cyouh95 staff 20 Feb 19 11:11 .gitignore
## -rw-r--r-- 1 cyouh95 staff 1073 Feb 19 11:11 LICENSE
## -rw-r--r-- 1 cyouh95 staff 4541 Feb 19 11:11 README.md
## -rwxr-xr-x 1 cyouh95 staff 5847 Feb 19 11:11 downloadipeds.R
## -rwxr-xr-x 1 cyouh95 staff 12296 Feb 19 11:11 ipeds_file_list.txt
## origin
## origin https://github.com/btskinner/downloadipeds.git (fetch)
## origin https://github.com/btskinner/downloadipeds.git (push)
Alternatively, you can create a new git repository on your local machine, and then connect it to the remote on GitHub.
Create a local git repository:
git initgit add changes to file(s) from the local working directory to the staging areagit commit -m "commit message" all staged changes to the local repositorygit branch -M mainCreate a remote repository on GitHub:
Initialize this repository with optionsConnect your local repository to the remote:
git remote add to add a new remote for your local repository
--set-upstream option with the git push command
Credit: Java T Point, Git Push
git remote: Add or modify a remote repository
git remote --helpgit remote add <remote_name> <remote_url>: Add a new remote
remote_name: Name we choose to call our remote repository, conventionally originremote_url: HTTPS/SSH URL of remote repositorygit remote set-url <remote_name> <remote_url>: Update the URL for the specified remote
remote_name: Name of the remote we want to update URL forremote_url: HTTPS/SSH URL we want to update togit push: Set and push to upstream branch
git push --helpgit push --set-upstream <remote_name> <branch_name>
remote_name: Name of the remote repository to push tobranch_name: Name of the remote branch you want your current branch to trackgit push.Example: Full sample workflow
# CREATING AND CHANGING DIRECTORIES
cd ~ # change directories to home directory
#cd documents # change to "documents" [if necessary]
ls # list files in directory
# make new directory that will be our git repository
# rm -rf gitr_practice # remove if it exists
mkdir gitr_practice
cd gitr_practice # move to new directory
ls -a # show all files in directory
# INITIALIZING GIT REPOSITORY
# turn the current, empty directory into a fresh Git repository
git init
ls -a # show all files in directory
# CHANGING FILES IN WORKING DIRECTORY
# create a new README file with some sample text
echo "Hello. I thought we would be learning R this quarter" >> README.txt
# view the file README.txt
cat README.txt
# create a simple R script
echo "library(tidyverse)" >> simple_script.r
echo "mpg %>% head(5)" >> simple_script.r # add another line to simple_script.r
cat simple_script.r # show contents of file simple_script.r
# STAGE AND COMMIT FILES TO LOCAL REPOSITORY
# check status of git repository
git status
# add README.txt from working directory to staging area (will now become a file that is "tracked" by git)
git add README.txt
# add simple_script.r from working directory to staging area (will now become a file that is "tracked" by git)
git add simple_script.r
# check status
git status
# commit changes to local repository
git commit -m "Initial commit, README.txt simple_script.r"
git status
# CONNECT AND PUSH TO REMOTE REPOSITORY
# rename default branch name
git branch -M main
# provide the path for the repository you created on GitHub in the first step
#git remote add origin https://github.com/YOUR-USERNAME/YOUR-REPOSITORY.git
git remote add origin https://github.com/ozanj/gitr_practice.git
# push changes to GitHub
git push --set-upstream origin mainExample: Using
git remote to add a remote
# Initialize a new git repository in `my_git_repo` directory
cd my_git_repo
git init
# Add remote (https://github.com/anyone-can-cook/my_git_repo) and name it `origin`
git remote add origin https://github.com/anyone-can-cook/my_git_repo.git
# Check remote
git remote -v## Initialized empty Git repository in /Users/cyouh95/my_git_repo/.git/
## origin https://github.com/anyone-can-cook/my_git_repo.git (fetch)
## origin https://github.com/anyone-can-cook/my_git_repo.git (push)
Note that we could’ve named the remote repository anything - it doesn’t have to be origin:
# Add remote (https://github.com/anyone-can-cook/my_git_repo) and name it `my_remote`
git remote add my_remote https://github.com/anyone-can-cook/my_git_repo.git
# Check remote
git remote -v## my_remote https://github.com/anyone-can-cook/my_git_repo.git (fetch)
## my_remote https://github.com/anyone-can-cook/my_git_repo.git (push)
Example: Using
git remote to update URL for a remote
## my_remote https://github.com/anyone-can-cook/my_git_repo.git (fetch)
## my_remote https://github.com/anyone-can-cook/my_git_repo.git (push)
# Change the URL for the remote named `my_remote`
git remote set-url my_remote https://github.com/anyone-can-cook/my_git_repo_2.git## my_remote https://github.com/anyone-can-cook/my_git_repo_2.git (fetch)
## my_remote https://github.com/anyone-can-cook/my_git_repo_2.git (push)
Example: Using
git push to push a new branch
# Create new R script
echo "library(tidyverse)" > create_dataset.R
echo "mpg %>% head(5)" >> create_dataset.R
# Add R script and make a commit
git add create_dataset.R
git commit -m "initial commit"# Because this is a new local branch, we get an error if we just use `git push` on the initial push
git push## fatal: The current branch main has no upstream branch.
## To push the current branch and set the remote as upstream, use
##
## git push --set-upstream my_remote main
As hinted in the error message, we need to use the --set-upstream option to set upstream branch on the initial push for a new local branch:
## my_remote https://github.com/anyone-can-cook/my_git_repo_2.git (fetch)
## my_remote https://github.com/anyone-can-cook/my_git_repo_2.git (push)
# We can check status to see that we are currently on the `main` branch
# (Note that because we have yet to set an upstream branch,
# it does not say our main branch is ahead of remote by 1 commit)
git status## On branch main
## nothing to commit, working tree clean
# Use the `--set-upstream` option with the remote and branch names to push new local branch
git push --set-upstream my_remote main## To https://github.com/anyone-can-cook/my_git_repo_2.git
## * [new branch] main -> main
## Branch main set up to track remote branch main from my_remote.
# Check status
# (Now that we have set the upstream branch,
# it says our main branch is up-to-date with the remote's main branch)
git status## On branch main
## Your branch is up-to-date with 'my_remote/main'.
##
## nothing to commit, working tree clean
Once a directory is initialized as a git repository, you can choose to track the changes to any file in the directory:
git add)git status can be used to check which files are tracked and which are not. Untracked files, except those listed in your .gitignore file, will be listed under Untracked files.
What is a .gitignore file? (see below for more details)
Untracked files when you check git status.gitignore file yourself or click Add .gitignore when you are creating a new repository on GitHub and select the R template from the dropdown menu:Credit: How to Make Git Forget Tracked Files Now In gitignore
Below are some common git commands you might use to observe your repository:
git statusgit status: Shows the working tree status
git status --helpgit status [<option(s)>]
Changes to be committed
git addgit commitChanges not staged for commit
git add before) that have since been changed (e.g., modified, deleted) in the working directorygit addUntracked files
git add before)git addBelow is a sample output of git status:
On branch main
Your branch is up-to-date with 'origin/main'.
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
new file: clean_dataset.R
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: create_dataset.R
Untracked files:
(use "git add <file>..." to include in what will be committed)
analyze_dataset.R
Example: Checking
git status after creating a new file
create_dataset.R in your git repositoryUntracked files# Create new R script
echo "library(tidyverse)" > create_dataset.R
echo "mpg %>% head(5)" >> create_dataset.R
git statusOn branch main
Your branch is up-to-date with 'origin/main'.
Untracked files:
(use "git add <file>..." to include in what will be committed)
create_dataset.R
nothing added to commit but untracked files present (use "git add" to track)
Example: Checking
git status after adding a file
create_dataset.R, you will see it listed under Changes to be committedOn branch main
Your branch is up-to-date with 'origin/main'.
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
new file: create_dataset.R
Example: Checking
git status after making a commit
On branch main
Your branch is ahead of 'origin/main' by 1 commit.
(use "git push" to publish your local commits)
nothing to commit, working tree clean
Example: Checking
git status after modifying a tracked file
Changes not staged for commit (as compared to under Untracked files when it’s never been tracked before)On branch main
Your branch is ahead of 'origin/main' by 1 commit.
(use "git push" to publish your local commits)
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: create_dataset.R
no changes added to commit (use "git add" and/or "git commit -a")
git loggit log: Show commit logs
git log --helpgit log [<option(s)>]
-n <int>: Show the latest <int> commitscommit <commit_hash>: Each commit can be uniquely identified by their hash ID (SHA-1)
Author: <username> <email>: Username and email of the author of the commitDate: <commit_date>: Date of the commit<commit_message>: Commit messageq to exit this read mode.Below is a sample output of git log:
commit 2e525e4b1c40f6cffb78438285a00cd7eed54ae0 (HEAD -> main)
Author: username <email@example.com>
Date: Thu Apr 2 23:53:30 2020 -0700
second commit
commit 8c20a14b99d7a490580045176287b979c93d9cb5
Author: username <email@example.com>
Date: Wed Apr 1 22:49:52 2020 -0700
initial commit
git diffgit diff: Show changes between files, commits, etc.
git diff --helpgit diff [<file_name(s)>]: Show changes made to unstaged files in working directory compared to the “index”
git add themChanges not staged for commit when you check git status), since untracked files have no history in the “index” to compare againstfile_name(s) specified, git diff shows changes made to all tracked, unstaged filesgit diff --cached [<file_name(s)>]: Show changes made to added files in staging area compared to the last commit
git commit commandfile_name(s) specified, git diff --cached shows changes made to all staged files (i.e., files listed under Changes to be committed when you check git status)git diff <commit_hash> <commit_hash> [<file_name(s)>]: Show changes between the two specified commits
file_name(s) specified, git diff <commit_hash> <commit_hash> shows changes between all filesgit diff
diff --git a/<file_name> b/<file_name>, which indicates that two versions of file_name is being comparedindex) or if a new file is involved (as in the case of git diff --cached for an untracked, staged file – see second example below)@@
- in front of a line indicates that the line has been removed in b/<file_name> as compared to a/<file_name>+ in front of a line indicates that the line has been added in b/<file_name> as compared to a/<file_name>Below is a sample output of git diff:
diff --git a/create_dataset.R b/create_dataset.R
index c1cff38..5ea84e9 100644
--- a/create_dataset.R
+++ b/create_dataset.R
@@ -1,2 +1,2 @@
library(tidyverse)
-mpg %>% head(5)
+mpg %>% filter(year == 2008)
Example: Checking
git diff for an untracked file
create_dataset.R in your git repositorygit diffExample: Checking
git diff for a staged file
create_dataset.R, it will be added to the “index”git diff --cached can be used to view all staged changesdiff --git a/create_dataset.R b/create_dataset.R
new file mode 100644
index 0000000..8b151a2
--- /dev/null
+++ b/create_dataset.R
@@ -0,0 +1 @@
+library(tidyverse)
Example: Checking
git diff for a modified, tracked file
git diff to see changes between the versions in the working directory and the staging areadiff --git a/create_dataset.R b/create_dataset.R
index 8b151a2..c1cff38 100644
--- a/create_dataset.R
+++ b/create_dataset.R
@@ -1 +1,2 @@
library(tidyverse)
+mpg %>% head(5)
Example: Checking
git diff after committing changes
library(tidyverse) in create_dataset.R)git diff (i.e., comparing changes between the working directory and “index”) is the same as the previous example, when the changes were just staged and not yet committeddiff --git a/create_dataset.R b/create_dataset.R
index 8b151a2..c1cff38 100644
--- a/create_dataset.R
+++ b/create_dataset.R
@@ -1 +1,2 @@
library(tidyverse)
+mpg %>% head(5)
Example: Checking
git diff between commits
create_dataset.R in the working directory (i.e., the line mpg %>% head(5)) and make a second commit# Add create_dataset.R and make a commit
git add create_dataset.R
git commit -m "add 2nd line to create_dataset.R"
git logcommit aa89efba9adddf8547b3743ba81a421dd2a28881 (HEAD -> main)
Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
Date: Sat Apr 4 03:20:15 2020 -0700
add 2nd line to create_dataset.R
commit d5c6e0958fb173af04f7e2c5d5fd81457e8ffd0c
Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
Date: Sat Apr 4 03:11:38 2020 -0700
add 1st line to create_dataset.R
git diff to check the differences between the two commits by specifying their hash ID’smpg %>% head(5) has been added between the two commitsdiff --git a/create_dataset.R b/create_dataset.R
index 8b151a2..c1cff38 100644
--- a/create_dataset.R
+++ b/create_dataset.R
@@ -1 +1,2 @@
library(tidyverse)
+mpg %>% head(5)
mpg %>% head(5) has been removed between the two commitsdiff --git a/create_dataset.R b/create_dataset.R
index c1cff38..8b151a2 100644
--- a/create_dataset.R
+++ b/create_dataset.R
@@ -1,2 +1 @@
library(tidyverse)
-mpg %>% head(5)
.git/ directory
Every git repository that is created using git init contains a .git/ directory that “contains all the informations needed for git to work” (From Git series 1/3: Understanding git for real by exploring the .git directory):
## Initialized empty Git repository in /Users/cyouh95/my_git_repo/.git/
## total 0
## drwxr-xr-x 3 cyouh95 staff 102 Feb 19 11:11 .
## drwxr-xr-x+ 112 cyouh95 staff 3808 Feb 19 11:11 ..
## drwxr-xr-x 9 cyouh95 staff 306 Feb 19 11:11 .git
What’s inside the .git/ directory?
# List out the contents of the .git/ directory (in tree form)
find .git -print | sed -e 's;[^/]*/;|____;g;s;____|; |;g'## .git
## |____config
## |____description
## |____HEAD
## |____hooks
## | |____applypatch-msg.sample
## | |____commit-msg.sample
## | |____post-update.sample
## | |____pre-applypatch.sample
## | |____pre-commit.sample
## | |____pre-push.sample
## | |____pre-rebase.sample
## | |____pre-receive.sample
## | |____prepare-commit-msg.sample
## | |____update.sample
## |____info
## | |____exclude
## |____objects
## | |____info
## | |____pack
## |____refs
## | |____heads
## | |____tags
We will be focusing on:
objects/: Directory containing all git objectsHEAD: Reference to the latest commit of the current branchrefs/: Directory containing the hash ID of commit referred to by HEADWe’ll get into git objects starting in the next section, and see an example of HEAD and refs/ in a later section.
What is a git object?
.git/objects directory
.git/objects that it is located ingit cat-file command to view information about a git object whose hash you specifygit hash-object to compute (show) the hash for a git “blob” object based on the name of associated file
git cat-file: Provide content or type and size information for repository objects
git cat-file --helpgit cat-file [<option(s)>] <object>-p: Pretty-print the contents of <object> based on its type-t: Instead of the content, show the object type identified by <object>-s: Instead of the content, show the object size identified by <object>
There are 4 types of git objects (From The Git Object Model)
A blob is generally a file which stores data
.git/objects directory
.git/objectsgit hash-object command# Create new R script
echo "library(tidyverse)" > create_dataset.R
echo "mpg %>% head(5)" >> create_dataset.R
# Add R script
git add create_dataset.R
# View .git/objects directory
find .git/objects -print | sed -e 's;[^/]*/;|____;g;s;____|; |;g'## |____objects
## | |____c1
## | | |____cff389562e8bc123e6691a60352fdf839df113
## | |____info
## | |____pack
git hash-object: Compute hash for a blob object from name of file
git hash-object --helpgit hash-object <file_name>We can use git hash-object to verify the hash for create_dataset.R:
## c1cff389562e8bc123e6691a60352fdf839df113
Example: Using
git cat-file to view blob object content
## library(tidyverse)
## mpg %>% head(5)
Example: Using
git cat-file to view blob object type
## blob
Example: Using
git cat-file to view blob object size
## 35
A tree is a directory that contains references to blobs (files) or other trees (sub-directories)
# Create a sub-directory
mkdir notes
# Add files to the sub-directory (since git doesn't track empty directories)
echo "This is my first set of notes." > notes/note_1.txt
echo "This is my second set of notes." > notes/note_2.txt
# Add new files
git add .
# View .git/objects directory
find .git/objects -print | sed -e 's;[^/]*/;|____;g;s;____|; |;g'## |____objects
## | |____47
## | | |____6fb98775843929ca6c55b16b04752d973b3d2a
## | |____61
## | | |____08458417308ddc15d7390a2f8db50cf65ec399
## | |____c1
## | | |____cff389562e8bc123e6691a60352fdf839df113
## | |____info
## | |____pack
As seen, new blob objects are created for note_1.txt and note_2.txt since the files have been added (but tree objects will not be created until a commit has been made):
## This is my second set of notes.
## This is my first set of notes.
After the files have been committed, tree objects will be created for any sub-directories as well as for the root directory of the repository:
## |____objects
## | |____47
## | | |____6fb98775843929ca6c55b16b04752d973b3d2a
## | |____61
## | | |____08458417308ddc15d7390a2f8db50cf65ec399
## | |____6c
## | | |____f7bbf49af4f9fd5103cf9f0a3fa25226b12336
## | |____7b
## | | |____47cd62cfec78fd687bc222d7a5f938c96b711b
## | |____c1
## | | |____cff389562e8bc123e6691a60352fdf839df113
## | |____f5
## | | |____9085df29aed7826a89b23af3f67fc3ab96f643
## | |____info
## | |____pack
As we now see, the tree objects for the my_git_repo/ root directory and notes/ sub-directory exists, and another object has been created for the commit (more info on that in next section):
# View object type for my_git_repo/ and notes/ trees
git cat-file -t f59085d
git cat-file -t 6cf7bbf
# View object type for the commit
git cat-file -t $(git rev-parse --short HEAD) # git rev-parse retrieves latest commit hash## tree
## tree
## commit
The content of a tree object is a list of all blobs (files) and other trees (sub-directories) in the directory. Each list entry follows the format:
<permission_code> <object_type> <object_hash> <object_name>
<permission_code>: Code indicating who has read/write access to the object
100644 for blobs and 100755 or 040000 for trees<object_type>: Type of the object (i.e., blobs or trees)<object_hash>: Reference to the object (i.e., the hash)<object_name>: Name of the file or directoryExample: Using
git cat-file to view tree object content for my_git_repo/ root directory
First, show files in directory using ls command with options al:
## total 8
## drwxr-xr-x 5 cyouh95 staff 170 Feb 19 11:11 .
## drwxr-xr-x+ 112 cyouh95 staff 3808 Feb 19 11:11 ..
## drwxr-xr-x 12 cyouh95 staff 408 Feb 19 11:11 .git
## -rw-r--r-- 1 cyouh95 staff 35 Feb 19 11:11 create_dataset.R
## drwxr-xr-x 4 cyouh95 staff 136 Feb 19 11:11 notes
Second, show contents of tree using git cat-file:
# View type and content of my_git_repo/ tree object
git cat-file -t f59085d # type
git cat-file -p f59085d # content## tree
## 100644 blob c1cff389562e8bc123e6691a60352fdf839df113 create_dataset.R
## 040000 tree 6cf7bbf49af4f9fd5103cf9f0a3fa25226b12336 notes
Example: Using
git cat-file to view tree object content for notes/ sub-directory
# View type and content of notes/ tree object
git cat-file -t 6cf7bbf # type
git cat-file -p 6cf7bbf # content## tree
## 100644 blob 6108458417308ddc15d7390a2f8db50cf65ec399 note_1.txt
## 100644 blob 476fb98775843929ca6c55b16b04752d973b3d2a note_2.txt
A commit object is created after a commit is made that contains information about the commit:
tree <tree_hash>
parent <commit_hash>
author <username> <email> <time>
committer <username> <email> <time>
<commit_message>
tree: Reference to the root directory tree object (i.e., “snapshot” of repository at the point of commit)parent: Reference to the parent commitauthor, committer, commit_message)
All commits except for the initial commit will contain a reference to its parent commit. So let’s create a second commit:
# Modify R script
echo "df <- mpg %>% filter(year == 2008)" >> create_dataset.R
# Add R script
git add create_dataset.R
# Make another commit
git commit -m "second commit"
# View .git/objects directory
find .git/objects -print | sed -e 's;[^/]*/;|____;g;s;____|; |;g'## [main 591d4e4] second commit
## 1 file changed, 1 insertion(+)
## |____objects
## | |____47
## | | |____6fb98775843929ca6c55b16b04752d973b3d2a
## | |____49
## | | |____0ec1c138021b8d5c196c26a2a7b3de69afc2d1
## | |____52
## | | |____4db779f0a3e3b3b353b522285c7da4830e21f1
## | |____59
## | | |____1d4e428e21e11ab90e871f669bb22e8315dadb
## | |____61
## | | |____08458417308ddc15d7390a2f8db50cf65ec399
## | |____6c
## | | |____f7bbf49af4f9fd5103cf9f0a3fa25226b12336
## | |____7b
## | | |____47cd62cfec78fd687bc222d7a5f938c96b711b
## | |____c1
## | | |____cff389562e8bc123e6691a60352fdf839df113
## | |____f5
## | | |____9085df29aed7826a89b23af3f67fc3ab96f643
## | |____info
## | |____pack
Example: Using
git cat-file to view commit object content for first commit
# Retrieve commit hash for first commit
git rev-list HEAD | tail -n 1
# View content of the commit object
git cat-file -p $(git rev-list HEAD | tail -n 1)## 7b47cd62cfec78fd687bc222d7a5f938c96b711b
## tree f59085df29aed7826a89b23af3f67fc3ab96f643
## author cyouh95 <25449416+cyouh95@users.noreply.github.com> 1613761892 -0800
## committer cyouh95 <25449416+cyouh95@users.noreply.github.com> 1613761892 -0800
##
## initial commit
Example: Using
git cat-file to view commit object content for second commit
# Retrieve commit hash for lastest commit
git rev-parse HEAD
# View content of the commit object
git cat-file -p $(git rev-parse HEAD)## 591d4e428e21e11ab90e871f669bb22e8315dadb
## tree 524db779f0a3e3b3b353b522285c7da4830e21f1
## parent 7b47cd62cfec78fd687bc222d7a5f938c96b711b
## author cyouh95 <25449416+cyouh95@users.noreply.github.com> 1613761892 -0800
## committer cyouh95 <25449416+cyouh95@users.noreply.github.com> 1613761892 -0800
##
## second commit
A tag object is created after a tag is generated:
object <object_hash>
type <object_type>
tag <tag_name>
tagger <username> <email> <time>
<tag_message>
object: Reference to the tagged objecttype: Object type of the tagged object (usually a commit)tag, tagger, tag_message)Let’s create a tag for the current commit:
# Create a tag
git tag -a v1 -m "version 1.0"
# View .git/objects directory
find .git/objects -print | sed -e 's;[^/]*/;|____;g;s;____|; |;g'## |____objects
## | |____47
## | | |____6fb98775843929ca6c55b16b04752d973b3d2a
## | |____49
## | | |____0ec1c138021b8d5c196c26a2a7b3de69afc2d1
## | |____52
## | | |____4db779f0a3e3b3b353b522285c7da4830e21f1
## | |____59
## | | |____1d4e428e21e11ab90e871f669bb22e8315dadb
## | |____5d
## | | |____fae1d0c9214175466659782ae44bcf84d96d52
## | |____61
## | | |____08458417308ddc15d7390a2f8db50cf65ec399
## | |____6c
## | | |____f7bbf49af4f9fd5103cf9f0a3fa25226b12336
## | |____7b
## | | |____47cd62cfec78fd687bc222d7a5f938c96b711b
## | |____c1
## | | |____cff389562e8bc123e6691a60352fdf839df113
## | |____f5
## | | |____9085df29aed7826a89b23af3f67fc3ab96f643
## | |____info
## | |____pack
Example: Using
git cat-file to view tag object
## object 591d4e428e21e11ab90e871f669bb22e8315dadb
## type commit
## tag v1
## tagger cyouh95 <25449416+cyouh95@users.noreply.github.com> 1613761892 -0800
##
## version 1.0
## commit 591d4e428e21e11ab90e871f669bb22e8315dadb
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:32 2021 -0800
##
## second commit
##
## commit 7b47cd62cfec78fd687bc222d7a5f938c96b711b
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:32 2021 -0800
##
## initial commit
HEAD and refs/The HEAD file is a pointer to your current (active) branch – specifically, it points to the latest commit of that branch (whose hash ID is stored in the refs/ directory). Especially when we get to working with multiple branches, the HEAD becomes important as it keeps track of which branch you are currently on.
If we output the contents of HEAD, we see it contains a reference to the main branch:
## ref: refs/heads/main
Following that reference, we can find the hash ID of the latest commit stored inside refs/heads/main:
## 591d4e428e21e11ab90e871f669bb22e8315dadb
We can use git log to verify that this is the hash ID of the latest commit:
## commit 591d4e428e21e11ab90e871f669bb22e8315dadb
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:32 2021 -0800
##
## second commit
##
## commit 7b47cd62cfec78fd687bc222d7a5f938c96b711b
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:32 2021 -0800
##
## initial commit
More generally, the refs/ directory stores references to all branches. In particular, refs/heads/ stores all your local branches:
## main
On the other hand, refs/remotes/ contains the remote HEAD and your remote-tracking branches. In other words, it is a local copy of your remote repository.
Inside refs/remotes/, There will be a folder for each of your remotes. For example, to view all references for the remote repository named origin, you can look under refs/remotes/origin:
## HEAD
## main
When you run git fetch, it will update the references in refs/remotes/ (i.e., your local copy of the remote repository), but it will not change anything in refs/heads/ (i.e., your local repository). Thus, git fetch is useful if you want a local copy of the most up-to-date changes in the remote repository (e.g., to preview changes), but don’t actually want to merge these changes into your local repository yet.
On the other hand, git pull is effectively a git fetch followed by a git merge (discussed more later). It will not only update refs/remotes/ but refs/heads as well to bring your local repository up-to-date with the remote.
## Initialized empty Git repository in /Users/cyouh95/my_git_repo/.git/
# Create new R script
echo "library(tidyverse)" > create_dataset.R
echo "mpg %>% head(5)" >> create_dataset.R
# R script initially starts off under `Untracked Files`
git status## On branch main
##
## No commits yet
##
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
##
## create_dataset.R
##
## nothing added to commit but untracked files present (use "git add" to track)
## On branch main
##
## No commits yet
##
## Changes to be committed:
## (use "git rm --cached <file>..." to unstage)
##
## new file: create_dataset.R
# Once R script has been added, a blob object is created for it in the .git/objects directory
find .git/objects -print | sed -e 's;[^/]*/;|____;g;s;____|; |;g'## |____objects
## | |____c1
## | | |____cff389562e8bc123e6691a60352fdf839df113
## | |____info
## | |____pack
# We can use `git hash-object` to verify the hash of the blob object
git hash-object create_dataset.R## c1cff389562e8bc123e6691a60352fdf839df113
## library(tidyverse)
## mpg %>% head(5)
## On branch main
## nothing to commit, working tree clean
## commit bb6d42974868c92d84064c6848b9221c2415d2ce
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:33 2021 -0800
##
## add create_dataset.R
# Verify that `HEAD` is indeed pointing to the last commit made, which is our initial commit
cat .git/HEAD
cat .git/refs/heads/main## ref: refs/heads/main
## bb6d42974868c92d84064c6848b9221c2415d2ce
# Further modify R script, which is now a tracked file
echo "df <- mpg %>% filter(year == 2008)" >> create_dataset.R
# R script is now under `Changes not staged for commit`
git status## On branch main
## Changes not staged for commit:
## (use "git add <file>..." to update what will be committed)
## (use "git checkout -- <file>..." to discard changes in working directory)
##
## modified: create_dataset.R
##
## no changes added to commit (use "git add" and/or "git commit -a")
## diff --git a/create_dataset.R b/create_dataset.R
## index c1cff38..490ec1c 100644
## --- a/create_dataset.R
## +++ b/create_dataset.R
## @@ -1,2 +1,3 @@
## library(tidyverse)
## mpg %>% head(5)
## +df <- mpg %>% filter(year == 2008)
# Add new changes made to R script
git add create_dataset.R
# .git/objects directory now contains blob objects for both versions of R script
# It also contains objects for the commit and root directory tree
find .git/objects -print | sed -e 's;[^/]*/;|____;g;s;____|; |;g'## |____objects
## | |____49
## | | |____0ec1c138021b8d5c196c26a2a7b3de69afc2d1
## | |____96
## | | |____6cc780d5994bc8a4ed535484cd7f8268e8e874
## | |____bb
## | | |____6d42974868c92d84064c6848b9221c2415d2ce
## | |____c1
## | | |____cff389562e8bc123e6691a60352fdf839df113
## | |____info
## | |____pack
# We can use `git hash-object` to verify the hash for the new blob object
git hash-object create_dataset.R## 490ec1c138021b8d5c196c26a2a7b3de69afc2d1
## library(tidyverse)
## mpg %>% head(5)
## df <- mpg %>% filter(year == 2008)
## [main 3d162ec] modify create_dataset.R
## 1 file changed, 1 insertion(+)
## commit 3d162ec944c1601a92bb3982f2e3c3f3f95bf791
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:33 2021 -0800
##
## modify create_dataset.R
##
## commit bb6d42974868c92d84064c6848b9221c2415d2ce
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:33 2021 -0800
##
## add create_dataset.R
# Verify that `HEAD` is pointing to the last commit made, which is now our second commit
cat .git/HEAD
cat .git/refs/heads/main## ref: refs/heads/main
## 3d162ec944c1601a92bb3982f2e3c3f3f95bf791
## tree 6de1187f46bbf4d76cafca7c0e5d3d61db6b5a53
## parent bb6d42974868c92d84064c6848b9221c2415d2ce
## author cyouh95 <25449416+cyouh95@users.noreply.github.com> 1613761893 -0800
## committer cyouh95 <25449416+cyouh95@users.noreply.github.com> 1613761893 -0800
##
## modify create_dataset.R
Below are some common git commands you might use to undo changes:
git checkoutgit checkout: Restore working tree files (or switch branches)
git checkout --helpgit checkout [<file_name(s)>]file_name(s) in the working directory
Changes not staged for commit when you check git status)git checkout command can also be used for switching branches, but that will be covered laterExample: Using
git checkout to discard changes to a tracked, unstaged file
# First, create new R script
echo "library(tidyverse)" > create_dataset.R
echo "mpg %>% head(5)" >> create_dataset.R
# Add/commit R script so it is now tracked
git add create_dataset.R
git commit -m "add create_dataset.R"## library(tidyverse)
## mpg %>% head(5)
# Modify R script
echo "df <- mpg %>% filter(year == 2008)" >> create_dataset.R
# View how create_dataset.R looks now
cat create_dataset.R## library(tidyverse)
## mpg %>% head(5)
## df <- mpg %>% filter(year == 2008)
## diff --git a/create_dataset.R b/create_dataset.R
## index c1cff38..490ec1c 100644
## --- a/create_dataset.R
## +++ b/create_dataset.R
## @@ -1,2 +1,3 @@
## library(tidyverse)
## mpg %>% head(5)
## +df <- mpg %>% filter(year == 2008)
# Undo those changes using git checkout
git checkout create_dataset.R
# View file after discarding changes
cat create_dataset.R## library(tidyverse)
## mpg %>% head(5)
git resetgit reset: Reset current HEAD to the specified state
git reset --helpgit reset HEAD <file_name(s)>: Unstages the specified file_name(s) from the staging area to the working directory
Changes to be committed when you check git status) and will move them back under Changes not staged for commit or Untracked filesHEAD is a pointer to the latest commit and will restore the staging area/“index” to that stategit reset <commit_hash>: Undo all commits up to (but not including) the specified commit_hash
HEAD pointer will be set to the specified commitExample: Using
git reset to unstage a file
# First, create new R script
echo "library(tidyverse)" > create_dataset.R
echo "mpg %>% head(5)" >> create_dataset.R
# Add/commit R script so it is now tracked
git add create_dataset.R
git commit -m "add create_dataset.R"# Modify R script
echo "df <- mpg %>% filter(year == 2008)" >> create_dataset.R
# Add new changes to the staging area
git add create_dataset.R
# Check status to verify it has been staged (listed under `Changes to be committed`)
git status## On branch main
## Changes to be committed:
## (use "git reset HEAD <file>..." to unstage)
##
## modified: create_dataset.R
# Use git reset to unstage file
git reset HEAD create_dataset.R
# Check status to verify it has been unstaged (listed under `Changes not staged for commit`)
git status## Unstaged changes after reset:
## M create_dataset.R
## On branch main
## Changes not staged for commit:
## (use "git add <file>..." to update what will be committed)
## (use "git checkout -- <file>..." to discard changes in working directory)
##
## modified: create_dataset.R
##
## no changes added to commit (use "git add" and/or "git commit -a")
Example: Using
git reset to undo a commit
# First, create new R script
echo "library(tidyverse)" > create_dataset.R
# Add/commit R script
git add create_dataset.R
git commit -m "add 1st line to create_dataset.R"# Modify R script
echo "mpg %>% head(5)" >> create_dataset.R
# Add/commit R script
git add create_dataset.R
git commit -m "add 2nd line to create_dataset.R"## [main 80daa24] add 2nd line to create_dataset.R
## 1 file changed, 1 insertion(+)
## commit 80daa245825911c7a0c01e4571c3b48ebd50950b
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:34 2021 -0800
##
## add 2nd line to create_dataset.R
##
## commit 9c202be439d4865f27d12d93f2623565b5fcb55d
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:33 2021 -0800
##
## add 1st line to create_dataset.R
# Specify the hash ID of the commit to undo up to
git reset $(git rev-list HEAD | tail -n 1) # this retrieves the first commit hash
# View commit log - the 2nd commit has been removed
git log## Unstaged changes after reset:
## M create_dataset.R
## commit 9c202be439d4865f27d12d93f2623565b5fcb55d
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:33 2021 -0800
##
## add 1st line to create_dataset.R
# Notice that the changes to the file is still retained in the working directory
cat create_dataset.R## library(tidyverse)
## mpg %>% head(5)
git revertgit revert: Revert existing commit(s)
git revert --helpgit revert --no-edit <commit_hash>: Revert all commits up to and including the specified commit_hash
git revert and git reset is that the former does not actually remove any past commits, but instead creates a new commit that reverts changes (see figure below)git revert so that it does not permanently erase historygit reset may also be an option--no-edit option means that you will use the default message for the revert commit--no-edit, you’ll be taken to a screen where you have a chance to edit the commit message of the new commit. Just enter :q to use the default message.Credit: NUKE Designs, Git revert
Example: Using
git revert to revert a commit
# First, create new R script
echo "library(tidyverse)" > create_dataset.R
# Add/commit R script
git add create_dataset.R
git commit -m "add 1st line to create_dataset.R"# Modify R script
echo "mpg %>% head(5)" >> create_dataset.R
# Add/commit R script
git add create_dataset.R
git commit -m "add 2nd line to create_dataset.R"## [main 768bd1e] add 2nd line to create_dataset.R
## 1 file changed, 1 insertion(+)
## commit 768bd1eb8af41ce0fd73f938792aa2cc4e87f984
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:34 2021 -0800
##
## add 2nd line to create_dataset.R
##
## commit 05778941d0829b1496b9b6d970ab62685e540529
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:34 2021 -0800
##
## add 1st line to create_dataset.R
# Specify the hash ID of the unwanted commit
git revert --no-edit $(git rev-parse --short HEAD) # git rev-parse retrieves latest commit hash
# View commit log
git log## [main bef198c] Revert "add 2nd line to create_dataset.R"
## 1 file changed, 1 deletion(-)
## commit bef198cb1fa62bfe71560ead4dc708a73d0bbb72
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:34 2021 -0800
##
## Revert "add 2nd line to create_dataset.R"
##
## This reverts commit 768bd1eb8af41ce0fd73f938792aa2cc4e87f984.
##
## commit 768bd1eb8af41ce0fd73f938792aa2cc4e87f984
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:34 2021 -0800
##
## add 2nd line to create_dataset.R
##
## commit 05778941d0829b1496b9b6d970ab62685e540529
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:34 2021 -0800
##
## add 1st line to create_dataset.R
## library(tidyverse)
What is a branch?
Credit: Modified from W3 docs, Git branch
Defining branches in terms of commits:
Credit: Modified from Mastering git branches by Henrique Mota
Why use branches?
predict_grad.Rmd. For example, one person writing functions to clean data and create analysis variables and another person writing functions to run models and store model results.predict_grad.Rmd at the same timegit branchgit branch: List, create, or delete branches
git branch --helpgit branch [<option(s)>]: List existing branches (default: only local branches)
* next to your current branch-a: List all branches, both local and remote (remote branches will start with remotes/)-r: List only remote branches-v: Display details about latest commits next to each branchgit branch <branch_name>: Create new local branchgit branch -d <branch_name>: Delete local branch
Example: Using
git branch to list branches
Let’s create a new git repository in the example below. Note that we will not be able to list branches until we’ve made at least 1 commit:
# Initialize a new git repository in `my_git_repo` directory
cd my_git_repo
git init
# Note that you won't be able to list branches until you've made at least 1 commit
git branch## Initialized empty Git repository in /Users/cyouh95/my_git_repo/.git/
# Create new R script
echo "library(tidyverse)" > create_dataset.R
# Add/commit R script
git add create_dataset.R
git commit -m "import tidyverse in create_dataset.R"# Now we can see the `main` branch listed, with an `*` indicating it is our current branch
git branch## * main
We can use the -v option to list branches with more details about the latest commit on each branch:
## * main 6144d4d import tidyverse in create_dataset.R
The -a option will list both local and remote branches. Remote branches will start with remotes/ in the output. They will also include both the remote repository name and the branch name (e.g., origin/main in the example below). In addition to remote branches, we’ll also see the remote HEAD listed and where it’s pointing to (e.g., remote HEAD is pointing to remote main branch in the example below):
## * main
## remotes/origin/HEAD -> origin/main
## remotes/origin/main
To list only information on remote branches, we can use the -r option. Notice that the names do not have remotes/ prepended, as that only appears when listing all branches using -a to be able to distinguish between local and remote branches:
## origin/HEAD -> origin/main
## origin/main
Example: Using
git branch to create new branch
## * main
## dev
## * main
git checkoutgit checkout: Switch branches
git checkout --helpgit checkout <branch_name>: Switches to an existing branch named branch_namegit checkout -b <branch_name>: Creates a new branch named branch_name and switches to itCredit: Modified from Pham Quy, Git tutorial
Example: Using
git checkout to create a new branch and switch to it
## * main
## Switched to a new branch 'dev'
## * dev
## main
What is an upstream branch?
git push --set-upstream <remote_name> <branch_name> (or equivalently, git push -u <remote_name> <branch_name>)
main branch to GitHub for the first timedev branch is tracking the remote dev branch (i.e., the upstream branch). Recall that under the hood, we also have a local copy of the remote repository, so origin/dev here is this local, remote-tracking branch.Credit: devconnected, How To Set Upstream Branch on Git
Example: Pushing a new local branch to the remote
When you create a new local branch, you may choose to push it to the remote if you want a copy of it on GitHub, or if you want others to be able to contribute to it. When you push a local branch for the first time, you are required to set the upstream branch, otherwise it won’t let you push. Then, all subsequent pushes after this first one can just be git push.
In the example below, let’s say we created a new branch dev that we want to push to a remote named origin:
What is a merge?
Credit: Modified from Eduard Lebedyuk
Merge terminology:
How programmers use branches and merges in day-to-day work:
Types of merges:
HEAD to point to the most recent commit from the “target branch”
Credit: Modified from Atlassian, Git merge
git mergegit merge: Merge branches
git merge --helpgit merge <branch_name>: All changes from branch_name will be merged into the current branchgit merge --abort: If a conflict arises during the merge, this can be run to restore both branches to their original statesExample: Using
git merge for fast-forward merge
Continuing from previous examples, we have the main and dev branches, which are even with the same initial commit:
## commit 6144d4d1efa49a372077d6053173f765e8f05887
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:34 2021 -0800
##
## import tidyverse in create_dataset.R
## Switched to branch 'dev'
## commit 6144d4d1efa49a372077d6053173f765e8f05887
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:34 2021 -0800
##
## import tidyverse in create_dataset.R
# View content of R script, which is the same on both `main` and `dev` branches
cat create_dataset.R## library(tidyverse)
Now, let’s make a second commit on the dev branch:
# Modify R script
echo "mpg %>% head(5)" >> create_dataset.R
echo "df <- mpg %>% filter(year == 2008)" >> create_dataset.R
# Add/commit R script
git add create_dataset.R
git commit -m "manipulate mpg dataset"## [dev e3cbdf2] manipulate mpg dataset
## 1 file changed, 2 insertions(+)
## commit e3cbdf20bdcdefbf07e21021182a1cd0417ee044
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:35 2021 -0800
##
## manipulate mpg dataset
##
## commit 6144d4d1efa49a372077d6053173f765e8f05887
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:34 2021 -0800
##
## import tidyverse in create_dataset.R
Let’s switch back to the main branch and merge in dev. Since the dev branch is ahead of main by 1 commit, the changes can be combined using a fast-forward merge:
## Switched to branch 'main'
## Updating 6144d4d..e3cbdf2
## Fast-forward
## create_dataset.R | 2 ++
## 1 file changed, 2 insertions(+)
## commit e3cbdf20bdcdefbf07e21021182a1cd0417ee044
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:35 2021 -0800
##
## manipulate mpg dataset
##
## commit 6144d4d1efa49a372077d6053173f765e8f05887
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:34 2021 -0800
##
## import tidyverse in create_dataset.R
Let’s examine the git object associated with the commit:
# Commit object hash
git rev-parse HEAD # git rev-parse retrieves latest commit hash
git cat-file -t $(git rev-parse HEAD) # type = commit
git cat-file -p $(git rev-parse HEAD)## e3cbdf20bdcdefbf07e21021182a1cd0417ee044
## commit
## tree 6de1187f46bbf4d76cafca7c0e5d3d61db6b5a53
## parent 6144d4d1efa49a372077d6053173f765e8f05887
## author cyouh95 <25449416+cyouh95@users.noreply.github.com> 1613761895 -0800
## committer cyouh95 <25449416+cyouh95@users.noreply.github.com> 1613761895 -0800
##
## manipulate mpg dataset
Examine the “tree” object associated with the commit:
git cat-file -t 6de1187f46bbf4d76cafca7c0e5d3d61db6b5a53 # type = tree
git cat-file -p 6de1187f46bbf4d76cafca7c0e5d3d61db6b5a53## tree
## 100644 blob 490ec1c138021b8d5c196c26a2a7b3de69afc2d1 create_dataset.R
Examine the “blob” object (file) associated with the commit:
git cat-file -t 490ec1c138021b8d5c196c26a2a7b3de69afc2d1 # type = blob
git cat-file -p 490ec1c138021b8d5c196c26a2a7b3de69afc2d1## blob
## library(tidyverse)
## mpg %>% head(5)
## df <- mpg %>% filter(year == 2008)
Examine the “parent” object associated with this commit:
# Parent commit hash
git rev-list HEAD | tail -n 1
git cat-file -t $(git rev-list HEAD | tail -n 1) # type = commit
git cat-file -p $(git rev-list HEAD | tail -n 1)## 6144d4d1efa49a372077d6053173f765e8f05887
## commit
## tree cb70185218351236255cdea1297210ceeaf6e3b5
## author cyouh95 <25449416+cyouh95@users.noreply.github.com> 1613761894 -0800
## committer cyouh95 <25449416+cyouh95@users.noreply.github.com> 1613761894 -0800
##
## import tidyverse in create_dataset.R
Example: Using
git merge for 3-way merge
Continuing from previous examples, we have the main and dev branches, which are even with the same two commits:
## commit e3cbdf20bdcdefbf07e21021182a1cd0417ee044
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:35 2021 -0800
##
## manipulate mpg dataset
##
## commit 6144d4d1efa49a372077d6053173f765e8f05887
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:34 2021 -0800
##
## import tidyverse in create_dataset.R
## dev
## * main
##
## library(tidyverse)
## mpg %>% head(5)
## df <- mpg %>% filter(year == 2008)
Now, let’s suppose the two branches diverge, both making changes to the R script:
# Modify R script
echo "library(tidyverse)" > create_dataset.R
echo "mpg %>% head(10)" >> create_dataset.R # this line is modified
echo "df <- mpg %>% filter(year == 2008)" >> create_dataset.R
# Add and commit changes
git add create_dataset.R
git commit -m "update head() on line 2" ## [main 422e291] update head() on line 2
## 1 file changed, 1 insertion(+), 1 deletion(-)
View updated content of R script on the main branch, which now shows head(10) instead of head(5):
## dev
## * main
##
## library(tidyverse)
## mpg %>% head(10)
## df <- mpg %>% filter(year == 2008)
Switch to dev branch, and make change to file create_dataset.R:
# Switch to `dev` branch
git checkout dev
# Modify R script
echo "df <- df %>% filter(manufacturer == 'audi')" >> create_dataset.R # add new line
# Add and commit changes
git add create_dataset.R
git commit -m "add additional filter() on line 4" ## Switched to branch 'dev'
## [dev d21f48c] add additional filter() on line 4
## 1 file changed, 1 insertion(+)
View updated content of R script on the dev branch, which now has additional filter() line at the end:
## * dev
## main
##
## library(tidyverse)
## mpg %>% head(5)
## df <- mpg %>% filter(year == 2008)
## df <- df %>% filter(manufacturer == 'audi')
Before we attempt to merge main and dev branches, we can use git diff to compare the two branches:
git diff <branch1_name> <branch2_name>## diff --git a/create_dataset.R b/create_dataset.R
## index da2f5c5..6665541 100644
## --- a/create_dataset.R
## +++ b/create_dataset.R
## @@ -1,3 +1,4 @@
## library(tidyverse)
## -mpg %>% head(10)
## +mpg %>% head(5)
## df <- mpg %>% filter(year == 2008)
## +df <- df %>% filter(manufacturer == 'audi')
Let’s switch back to the main branch and merge in dev. Since both branches made changes to the R script on different lines, the changes can be combined without any conflicts via a 3-way merge:
## Switched to branch 'main'
## Auto-merging create_dataset.R
## Merge made by the 'recursive' strategy.
## create_dataset.R | 1 +
## 1 file changed, 1 insertion(+)
## commit 6c06777d890eecbe4d35bdd2b867f222b8f5a8fa
## Merge: 422e291 d21f48c
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:35 2021 -0800
##
## Merge branch 'dev' into main
##
## commit 422e29129c8341a655aa00d6dd9d0abc857080a5
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:35 2021 -0800
##
## update head() on line 2
##
## commit d21f48c76a74f103307b3bbf7dd3ed39f6fb0eda
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:35 2021 -0800
##
## add additional filter() on line 4
##
## commit e3cbdf20bdcdefbf07e21021182a1cd0417ee044
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:35 2021 -0800
##
## manipulate mpg dataset
##
## commit 6144d4d1efa49a372077d6053173f765e8f05887
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:34 2021 -0800
##
## import tidyverse in create_dataset.R
## library(tidyverse)
## mpg %>% head(10)
## df <- mpg %>% filter(year == 2008)
## df <- df %>% filter(manufacturer == 'audi')
git pullgit pull: Incorporate remote changes into your current branch
git pull --helpgit pull: This is equivalent to a git fetch followed by a git merge to incorporate remote changes to your current branchgit fetch is useful if you want a local copy of the most up-to-date changes in the remote repository (e.g., to preview changes), but don’t actually want to merge these changes into your local repository yet. On the other hand, running git pull will directly incorporate the changes.git fetch will incorporate changes into your remote-tracking branch (e.g., origin/main, your local copy of the remote main branch) but not your local branch (e.g., your local main branch). Then, git merge can merge the change from your remote-tracking branch into your local branch.Credit: Modified from Medium, Git Fetch vs Git Pull
Example: Using
git pull to incorporate remote changes
Let’s say your remote branch is ahead of your local branch by some commits. You can run git pull to incorporate those changes:
After you run the command, you may see some output indicating the progress as remote changes are being fetched:
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (3/3), done.
Then, the output will look something like the below:
origin/main branch, which is our local copy of the remote main branchgit pull is just git fetch followed by git merge)From github.com:anyone-can-cook/student_lastname_firstname
1eeaff7..6c3e46f main -> origin/main
Updating 1eeaff7..6c3e46f
Fast-forward
README.md | 2 ++
my_script.R | 4 ++--
2 files changed, 4 insertions(+), 2 deletions(-)
As we’ll see in the next example, the first 2 lines of the output comes from git fetch being run and the remaining lines come from git merge.
Example: Using
git fetch and git merge to incorporate remote changes
Running git pull essentially performs a git fetch followed by git merge. If we only want to fetch the remote changes to our local repository but not incorporate them into our current branch, we can use git fetch:
From github.com:anyone-can-cook/student_lastname_firstname
1eeaff7..6c3e46f main -> origin/main
We can verify that the fetch only updated our remote-tracking branch origin/main (i.e., our local copy of the remote main branch) and not our local main branch by checking the commit history of the branches.
Assuming we are currently on our local main branch, we can run git log to view the commit history. In the output, we see HEAD -> main next to the most recent commit, indicating that HEAD is pointing to this commit on the main branch:
commit e329908682dfefba0417bd7337cc660d0d5f133d (HEAD -> main)
Author: username <email@example.com>
Date: Fri Jan 22 11:15:50 2021 -0800
initial commit
Next, we can check the commit log of the remote-tracking branch origin/main. In the output below, we can see that the changes have indeed been fetched to this branch, as indicated by the presence of the second commit. In parentheses next to the commits, we can again see that our local main branch still only contains the first commit while origin/main and origin/HEAD has been updated with the second. HEAD always points to the latest commit on your current (active) branch, so it also appears next to the second commit:
commit 1eeaff75a681213890e5ce4850d17a1672a4ada6 (HEAD, origin/main, origin/HEAD)
Author: username <email@example.com>
Date: Fri Jan 22 11:27:40 2021 -0800
second commit
commit e329908682dfefba0417bd7337cc660d0d5f133d (main)
Author: username <email@example.com>
Date: Fri Jan 22 11:15:50 2021 -0800
initial commit
After we are satisfied with the fetched changes, we can manually merge them into our local main branch:
Updating 1eeaff7..6c3e46f
Fast-forward
README.md | 2 ++
my_script.R | 4 ++--
2 files changed, 4 insertions(+), 2 deletions(-)
Alternatively, we could have just run git pull instead of git merge origin/main and it would’ve also merged in the changes (after performing git fetch again).
If we check the commit history on our local main branch again, we can see it has now been updated:
commit 1eeaff75a681213890e5ce4850d17a1672a4ada6 (HEAD -> main, origin/main, origin/HEAD)
Author: username <email@example.com>
Date: Fri Jan 22 11:27:40 2021 -0800
second commit
commit e329908682dfefba0417bd7337cc660d0d5f133d
Author: username <email@example.com>
Date: Fri Jan 22 11:15:50 2021 -0800
initial commit
When attempting a git merge, conflict can arise when starting a merge or during the merge. (From Git merge conflicts)
When starting a merge, Git will first check if you have any changes in either the working directory or staging area. If so, Git will abort the merge completely and display an error message that looks like this:
error: Your local changes to the following files would be overwritten by merge:
<file_name>
Please commit your changes or stash them before you merge.
Aborting
During a 3-way merge when both branches made changes to the same line(s) of the same file(s), a conflict will occur. The error message would look like this:
Auto-merging <file_name>
CONFLICT (content): Merge conflict in <file_name>
Automatic merge failed; fix conflicts and then commit the result.
If you open the failed file, you will see that Git has marked the line(s) that were conflicting:
<normal_line_of_code>
<normal_line_of_code>
<<<<<<< HEAD
<conflicted_line_of_code__current_branch_version>
=======
<conflicted_line_of_code__target_branch_version>
>>>>>>> <branch_name>
<normal_line_of_code>
<normal_line_of_code>
These conflicts will need to be resolved manually (described in next section), or the merge can be aborted using git merge --abort.
Example: Merge conflict when starting a merge
Continuing from previous examples, our main branch currently looks like this:
## commit 6c06777d890eecbe4d35bdd2b867f222b8f5a8fa
## Merge: 422e291 d21f48c
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:35 2021 -0800
##
## Merge branch 'dev' into main
##
## commit 422e29129c8341a655aa00d6dd9d0abc857080a5
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:35 2021 -0800
##
## update head() on line 2
##
## commit d21f48c76a74f103307b3bbf7dd3ed39f6fb0eda
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:35 2021 -0800
##
## add additional filter() on line 4
##
## commit e3cbdf20bdcdefbf07e21021182a1cd0417ee044
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:35 2021 -0800
##
## manipulate mpg dataset
##
## commit 6144d4d1efa49a372077d6053173f765e8f05887
## Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
## Date: Fri Feb 19 11:11:34 2021 -0800
##
## import tidyverse in create_dataset.R
## dev
## * main
##
## library(tidyverse)
## mpg %>% head(10)
## df <- mpg %>% filter(year == 2008)
## df <- df %>% filter(manufacturer == 'audi')
Let’s create a new branch called revision that branches off main, then make a new commit on this branch:
# Create and switch to new branch
git checkout -b revision
# Modify R script
echo "library(tidyverse)" > create_dataset.R
echo "mpg %>% head(10)" >> create_dataset.R
echo "df <- mpg %>% filter(year == 2008)" >> create_dataset.R
echo "df <- df %>% filter(manufacturer == 'lincoln')" >> create_dataset.R # this line is modified
# Add and commit change
git add create_dataset.R
git commit -m "filter for lincoln instead of audi"## Switched to a new branch 'revision'
## [revision c6d326f] filter for lincoln instead of audi
## 1 file changed, 1 insertion(+), 1 deletion(-)
View updated content of R script on the revision branch, which now filters for lincoln instead of audi on the last line:
## dev
## main
## * revision
##
## library(tidyverse)
## mpg %>% head(10)
## df <- mpg %>% filter(year == 2008)
## df <- df %>% filter(manufacturer == 'lincoln')
Back on the main branch, let’s modify the same line in the R script:
# Switch back to `main` branch
git checkout main
# Modify R script
echo "library(tidyverse)" > create_dataset.R
echo "mpg %>% head(10)" >> create_dataset.R
echo "df <- mpg %>% filter(year == 2008)" >> create_dataset.R
echo "df <- df %>% filter(manufacturer == 'chevrolet')" >> create_dataset.R # this line is modified## Switched to branch 'main'
Notice that we have uncommitted changes in the working directory:
## On branch main
## Changes not staged for commit:
## (use "git add <file>..." to update what will be committed)
## (use "git checkout -- <file>..." to discard changes in working directory)
##
## modified: create_dataset.R
##
## no changes added to commit (use "git add" and/or "git commit -a")
If we try to merge changes from revision into main now, there will be a merge conflict because we have uncommited changes. The merge will be aborted:
## error: Your local changes to the following files would be overwritten by merge:
## create_dataset.R
## Please commit your changes or stash them before you merge.
## Aborting
Example: Merge conflict during a merge
Continuing from the previous example, let’s say we commited our change to create_dataset.R on the main branch:
# Add and commit change
git add create_dataset.R
git commit -m "filter for chevrolet instead of audi"## [main a40b508] filter for chevrolet instead of audi
## 1 file changed, 1 insertion(+), 1 deletion(-)
View updated content of R script on the main branch, which now filters for chevrolet instead of audi on the last line:
## dev
## * main
## revision
##
## library(tidyverse)
## mpg %>% head(10)
## df <- mpg %>% filter(year == 2008)
## df <- df %>% filter(manufacturer == 'chevrolet')
Recall that create_dataset.R on the revision branch looks like this:
## library(tidyverse)
## mpg %>% head(10)
## df <- mpg %>% filter(year == 2008)
## df <- df %>% filter(manufacturer == 'lincoln')
If we try to merge changes from revision into main now, there will be a merge conflict because both branches modified the same line of the same file:
## Auto-merging create_dataset.R
## CONFLICT (content): Merge conflict in create_dataset.R
## Automatic merge failed; fix conflicts and then commit the result.
You can also tell which file(s) failed to merge by checking git status:
## On branch main
## You have unmerged paths.
## (fix conflicts and run "git commit")
## (use "git merge --abort" to abort the merge)
##
## Unmerged paths:
## (use "git add <file>..." to mark resolution)
##
## both modified: create_dataset.R
##
## no changes added to commit (use "git add" and/or "git commit -a")
The file(s) that failed to merge will contain markings by Git that indicates which line(s) are conflicted:
## library(tidyverse)
## mpg %>% head(10)
## df <- mpg %>% filter(year == 2008)
## <<<<<<< HEAD
## df <- df %>% filter(manufacturer == 'chevrolet')
## =======
## df <- df %>% filter(manufacturer == 'lincoln')
## >>>>>>> revision
What to do when you encounter a merge conflict?
git merge --abort to abort the merge and restore the branches back to their original states<<<<<<< HEAD, =======, >>>>>>> <branch_name>) and choose which version of the conflicted line to keepgit add the file(s) after you are done resolving the conflictsgit commit -m "<commit_message>" to complete the mergeExample: Resolving a merge conflict
## library(tidyverse)
## mpg %>% head(10)
## df <- mpg %>% filter(year == 2008)
## <<<<<<< HEAD
## df <- df %>% filter(manufacturer == 'chevrolet')
## =======
## df <- df %>% filter(manufacturer == 'lincoln')
## >>>>>>> revision
We can manually edit the file to resolve the conflicts. Let’s say we choose to filter for 'volkswagen' instead:
## library(tidyverse)
## mpg %>% head(10)
## df <- mpg %>% filter(year == 2008)
## df <- df %>% filter(manufacturer == 'volkswagen')
Finally, we can add and commit the file to complete the merge:
What is a pull request?
“Pull requests let you tell others about changes you’ve pushed to a branch in a repository on GitHub. Once a pull request is opened, you can discuss and review the potential changes with collaborators and add follow-up commits before your changes are merged into the base branch.” – GitHub Help
Why make a pull request?
Example: Alternative to pull request: Merging changes directly into main
Let’s say we create a new R script and add/commit that to the main branch:
# Create new R script
echo "library(tidyverse)" > create_dataset.R
# Add/commit R script
git add create_dataset.R
git commit -m "import tidyverse library"Then, we create a new branch and make further changes to the R script on the branch:
# Create and switch to new branch
git checkout -b dev
# Modify R script
echo "mpg %>% head(5)" >> create_dataset.R
# Add/commit R script
git add create_dataset.R
git commit -m "preview mpg dataset"## Switched to a new branch 'dev'
##
## [dev 336a3d9] preview mpg dataset
## 1 file changed, 1 insertion(+)
At this point, we can push this new branch to the remote if we wanted to open a pull request. But the alternative is to directly merge the changes to main:
## Switched to branch 'main'
## Updating aca9e0f..336a3d9
## Fast-forward
## create_dataset.R | 1 +
## 1 file changed, 1 insertion(+)
Then, we can push the changes to the remote’s main branch, which would also be the ultimate goal of a pull request:
All image credits: GitHub Help
Creating a topical branch:
Making the pull request:
On GitHub, select your branch and click New pull request:
Add a title and (optionally) a description for your pull request. You can also @ users/teams if you want:
Click Create Pull Request:
Your pull request will appear under the tab Pull requests:
Assigning reviewers:
On the right-hand side of the pull request, you are also able to assign Reviewers or Assignees, similar to an issue:
Reviewers should be someone who you want to review the changes you made, while Assignees could be anyone else more generally involved in the pull request
The users listed under Reviewers (unlike Assignees) will also have a status icon:
Example: Creating a pull request
Similar to the previous example, let’s say we create a new R script and added/committed that to the main branch:
# Create new R script
echo "library(tidyverse)" > create_dataset.R
# Add/commit R script
git add create_dataset.R
git commit -m "import tidyverse library"Then, we create a new branch and make further changes to the R script on the branch:
# Create and switch to new branch
git checkout -b dev
# Modify R script
echo "mpg %>% head(5)" >> create_dataset.R
# Add/commit R script
git add create_dataset.R
git commit -m "preview mpg dataset"## Switched to a new branch 'dev'
##
## [dev 336a3d9] preview mpg dataset
## 1 file changed, 1 insertion(+)
At this point, we can push this new branch to the remote repository. Remember to set the upstream branch if this is the first time you are pushing the branch to remote:
All the subsequent steps to open the pull request will be performed on GitHub.
There are two ultimate responses to a pull request.
But before coming to one of these decisions, you will likely want to review the changes in more detail.
Under the Files tab, you can view all changes that would potentially be merged if the pull request is completed:
There, you will also see a button called Review changes that contains three options for leaving a review:
Comment:
Submit review
Approve:
Submit review
Request changes:
Select this option to request further changes before merging
Submit reviewThe reviewer status will be changed to
You will see that the merge box on the main pull request page is outlined in orange, along with a list of reviewers who requested changes:
To respond to the change request from each reviewer, there are three options:
Approve changes: The reviewer can select this to resolve the change request
See review insteadDismiss review: The review can be dismissed by anyone
Re-request review: Another review from the reviewer can be requested
Note that the merge box outline color and reviewer status do not affect the ability to merge the pull request
.gitignore fileWhat is a .gitignore file? (gitignore documentation)
.gitignore file specifies a pattern to ignore (more below)fnmatch style patternsUntracked files when you check git status
.gitignore does not affect files already being trackedgit rm --cached.gitignore file is usually in your project root directory
.gitignore or .gitignore in various subdirectories if you need to ignore different files in different locations.gitignore file yourself or click Add .gitignore when you are creating a new repository on GitHub and select the R template from the dropdown menu
.gitignore can be found here (e.g., the R template)Credit: How to Make Git Forget Tracked Files Now In gitignore
Pattern formats in .gitignore file:
# are treated as comments! means do not ignore this pattern\ to escape literal #, !, or trailing spaces* matches anything except /? matches any one character except /[a-z], [0-9]) can be used to match one of the characters in a range/ at the end will only match directories and not files/ in the beginning or middle will only match relative to the directory the .gitignore file is in and not any subdirectories
**/ to the start of the pattern/**/ in the middle of the path matches zero or more directoriesExample: Ignoring files by name patterns
Let’s say we have a git repository with the following files and directory structure:
## .
## |____A1.csv
## |____A1.png
## |____A1.tsv
## |____ABC
## | |____README.md
## |____B2.csv
## |____blank.txt
## |____de.csv
When we check git status, all the files are untracked:
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
##
## A1.csv
## A1.png
## A1.tsv
## ABC/README.md
## B2.csv
## de.csv
##
## nothing added to commit but untracked files present (use "git add" to track)
Let’s create a .gitignore file in the root directory. In .gitignore, we can specify which files to ignore:
# Ignores `A1.csv`, `A1.png`, and `A1.tsv`
echo "A1.csv" > .gitignore
echo "A1.png" >> .gitignore
echo "A1.tsv" >> .gitignore
cat .gitignore## A1.csv
## A1.png
## A1.tsv
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
##
## .gitignore
## ABC/README.md
## B2.csv
## de.csv
##
## nothing added to commit but untracked files present (use "git add" to track)
We can use the wildcard * to match any characters that’s not a /. For example, A* matches all files and directories that starts with an A:
# Ignores `A1.csv`, `A1.png`, `A1.tsv`, and `ABC/` directory using `*`
echo "A*" > .gitignore
cat .gitignore## A*
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
##
## .gitignore
## B2.csv
## de.csv
##
## nothing added to commit but untracked files present (use "git add" to track)
To specify a file or pattern not to match (i.e., not ignore), put ! at the start of the line:
# Ignores all files and directories starting with `A` except `A1.png`
echo "A*" > .gitignore
echo "!A1.png" >> .gitignore
cat .gitignore## A*
## !A1.png
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
##
## .gitignore
## A1.png
## B2.csv
## de.csv
##
## nothing added to commit but untracked files present (use "git add" to track)
To only match directories, add a trailing / to your pattern:
# Ignores `ABC/` directory only and not files starting with `A`
echo "A*/" > .gitignore
cat .gitignore## A*/
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
##
## .gitignore
## A1.csv
## A1.png
## A1.tsv
## B2.csv
## de.csv
##
## nothing added to commit but untracked files present (use "git add" to track)
The ? can be used to match any one character that’s not a /:
## A1.?sv
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
##
## .gitignore
## A1.png
## ABC/README.md
## B2.csv
## de.csv
##
## nothing added to commit but untracked files present (use "git add" to track)
Square brackets [] can be used to specify specific characters to match:
## A1.[ct]sv
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
##
## .gitignore
## A1.png
## ABC/README.md
## B2.csv
## de.csv
##
## nothing added to commit but untracked files present (use "git add" to track)
Ranges can also be specified using square brackets [] to match a range of characters (e.g., alphabet or numeric):
## [a-z][0-9].csv
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
##
## .gitignore
## A1.png
## A1.tsv
## ABC/README.md
## de.csv
##
## nothing added to commit but untracked files present (use "git add" to track)
Ranges can also be alphanumeric:
# Ignores `A1.csv`, `B2.csv`, and `de.csv` using ranges
echo "[a-z][a-z0-9].csv" > .gitignore
cat .gitignore## [a-z][a-z0-9].csv
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
##
## .gitignore
## A1.png
## A1.tsv
## ABC/README.md
##
## nothing added to commit but untracked files present (use "git add" to track)
Example: Ignoring files and nested files
Let’s say we have a git repository with the following files and directory structure:
## .
## |____blank.txt
## |____doc
## | |____README.md
## |____intput
## | |____doc
## | | |____README.md
## |____output
## | |____doc
## | | |____README.md
## | |____plots
## | | |____doc
## | | | |____README.md
## |____README.md
When we check git status, all the README.md files are untracked:
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
##
## README.md
## doc/README.md
## intput/doc/README.md
## output/doc/README.md
## output/plots/doc/README.md
##
## nothing added to commit but untracked files present (use "git add" to track)
Let’s create a .gitignore file in the root directory. If we add README.md to .gitignore, all the README.md files will be ignored:
## README.md
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
##
## .gitignore
##
## nothing added to commit but untracked files present (use "git add" to track)
If we add doc/README.md to the .gitignore file, only the doc/README.md in the project root directory (i.e., where the .gitignore file is located) will be ignored because there’s a / in the middle of the pattern:
# Ignores `doc/README.md` in the root directory where `.gitignore` is located
echo "doc/README.md" > .gitignore
cat .gitignore## doc/README.md
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
##
## .gitignore
## README.md
## intput/doc/README.md
## output/doc/README.md
## output/plots/doc/README.md
##
## nothing added to commit but untracked files present (use "git add" to track)
Similarly, if we start a pattern with / like /doc, it will only match things in the directory where the .gitignore file is located (i.e., not the /doc folders nested within the subdirectories):
# Ignores `doc/` in the root directory where `.gitignore` is located
echo "/doc" > .gitignore
cat .gitignore## /doc
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
##
## .gitignore
## README.md
## intput/doc/README.md
## output/doc/README.md
## output/plots/doc/README.md
##
## nothing added to commit but untracked files present (use "git add" to track)
In order to match things in subdirectories, we need to add **/ to the start of the pattern. So **/doc will match /doc in both the directory where .gitignore is located as well as in subdirectories:
# Ignores all `doc/` in both the root directory and within subdirectories
echo "**/doc" > .gitignore
cat .gitignore## **/doc
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
##
## .gitignore
## README.md
##
## nothing added to commit but untracked files present (use "git add" to track)
Having /**/ in the middle of the path will match zero or more directories:
## output/**/doc
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
##
## .gitignore
## README.md
## doc/README.md
## intput/doc/README.md
##
## nothing added to commit but untracked files present (use "git add" to track)
Having just * in the path will match any one directory:
## output/*/doc
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
##
## .gitignore
## README.md
## doc/README.md
## intput/doc/README.md
## output/doc/README.md
##
## nothing added to commit but untracked files present (use "git add" to track)
This matches all doc/ folders that’s inside some arbitrary folder (indicated by *) that’s located in the root directory (i.e., directory where .gitignore is located):
## */doc
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
##
## .gitignore
## README.md
## doc/README.md
## output/plots/doc/README.md
##
## nothing added to commit but untracked files present (use "git add" to track)
Two primary ways people collaborate on GitHub:
What is a fork?
Why use forks?
Credit: Shaumik Daityari
Overview of fork and pull workflow:
central_repo repository
your_forkyour_fork repository only exists on GitHubclone the your_fork repository to your local machine
add changes to index/staging areacommit changes to local your_fork repositorypush changes to remote your_fork repositoryyour_fork repository be incorporated to the main central_repo repository